Multi-label Classification without the Multi-label Cost
نویسندگان
چکیده
Multi-label classification, or the same example can belong to more than one class label, happens in many applications. To name a few, image and video annotation, functional genomics, social network annotation and text categorization are some typical applications. Existing methods have limited performance in both efficiency and accuracy. In this paper, we propose an extension over decision tree ensembles that can handle both challenges. We formally analyze the learning risk of Random Decision Tree (RDT) and derive that the upper bound of risk is stable and lower bound decreases as the number of trees increases. Importantly, we demonstrate that the training complexity is independent from the number of class labels, a significant overhead for many state-of-the-art multi-label methods. This is particularly important for problems with large number of multi-class labels. Based on these characteristics, we adopt and improve RDT for multi-label classification. Experiment results have demonstrated that the computation time of the proposed approaches is 1-3 orders of magnitude less than other methods when handling datasets with large number of instances and labels, as well as improvement up to more than 10% in accuracy as compared to a number of state-of-the-art methods in some datasets for multi-label learning. Considering efficiency and effectiveness together, Multi-label RDT is the top rank algorithm in this domain. Even compared with the HOMER algorithm proposed to solve the problem of large number of labels, Multi-label RDT runs 2-3 orders of magnitude faster in training process and achieves some improvement on accuracy. Software and datasets are available from the authors.
منابع مشابه
Exploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملCost-Sensitive Reference Pair Encoding for Multi-Label Learning
We propose a novel cost-sensitive multi-label classification algorithm called cost-sensitive random pair encoding (CSRPE). CSRPE reduces the costsensitive multi-label classification problem to many cost-sensitive binary classification problems through the label powerset approach followed by the classic oneversus-one decomposition. While such a näıve reduction results in exponentiallymany classi...
متن کاملCost Sensitive Ranking Support Vector Machine for Multi-label Data Learning
Multi-label data classification has become an important and active research topic, where the classification algorithm is required to deal with prediction of sets of label indicators for instances simultaneously. Label powerset (LP) method reduces the multi-label classification problem to a single-label multi-class classification problem by treating each distinct combination of labels. However, ...
متن کاملLarge-Scale Multi-Label Learning with Incomplete Label Assignments
Multi-label learning deals with the classification problems where each instance can be assigned with multiple labels simultaneously. Conventional multi-label learning approaches mainly focus on exploiting label correlations. It is usually assumed, explicitly or implicitly, that the label sets for training instances are fully labeled without any missing labels. However, in many real-world multi-...
متن کاملMulti-label classification with a reject option
We consider multi-label classification problems in application scenarios where classifier accu-racy is not satisfactory, but manual annotation is too costly. In single-label problems, a wellknown solution consists of using a reject option, i.e., allowing a classifier to withhold unreliabledecisions, leaving them (and only them) to human operators. We argue that this solution can be<...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010